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Abstract. We propose a model for the World Wide Web graph that 
couples the topological growth with the traffic's dynamical evolution. 
The model is based on a simple traffic-driven dynamics and generates 
weighted directed graphs exhibiting the statistical properties observed 
in the Web. In particular, the model yields a non-trivial time evolution 
of vertices and heavy-tail distributions for the topological and traffic 
properties. The generated graphs exhibit a complex architecture with a 
hierarchy of cohesiveness levels similar to those observed in the analysis 
of real data. 



1 Introduction 

The World Wide Web (WWW) has evolved into an immense and intricate struc- 
ture whose understanding represents a major scientific and technological chal- 
lenge. A fundamental step in this direction is taken with the experimental studies 
of the WWW graph structure in which vertices and directed edges are identified 
with web-pages and hyperlinks, respectively. These studies are based on crawlers 
that explore the WWW connectivity by following the links on each discovered 
page, thus reconstructing the topological properties of the representative graph. 
In particular, data gathered in large scale crawls 1 112131 115 j have uncovered the 
presence of a complex architecture underlying the structure of the WWW graph. 
A first observation is the small-world property [H] which means that the average 
distance between two vertices (measured by the length of the shortest path) is 
very small. Another important result is that the WWW exhibits a power-law 
relationship between the frequency of vertices and their degree, defined as the 
number of directed edges linking each vertex to its neighbors. This last feature 
is the signature of a very complex and heterogeneous topology with statistical 
fluctuations extending over many length scales p^. 

These complex topological properties are not exclusive to the WWW and are 
encountered in a wide range of networked structures belonging to very different 
domains such as ecology, biology, social and technological systems |7I8I9I1U) . 
The need for general principles explaining the emergence of complex topological 
features in very diverse systems has led to a wide array of models aimed at cap- 
turing various properties of real networks |7I9I1U| . including the WWW. Models 



do however generally consider only the topological structure and do not take 
into account the interaction strength -the weight of the link- that characterizes 
real networks |11I12I13I14I15I16] . Interestingly, recent studies of various types 
of weighted networks |15I17| have shown additional complex properties such as 
broad distributions and non-trivial correlations of weights that do not find an 
explanation just in terms of the underlying topological structure. In the case of 
the WWW, it has also been recognized that the complexity of the network en- 
compasses not only its topology but also the dynamics of information. Examples 
of this complexity are navigation patterns, community structures, congestions, 
and other social phenomena resulting from the users' behavior j!8I19| . In ad- 
dition, Adamic and Huberman 4 pointed out that the number of users of a 
web-site is broadly distributed, showing the relevance and heterogeneity of the 
traffic carried by the WWW. 

In this work we propose a simple model for the WWW graph that takes 
into account the traffic (number of visitors) on the hyper-links and considers the 
dynamical basic evolution of the system as being driven by the traffic properties 
of web-pages and hyperlinks. The model also mimics the natural evolution and 
reinforcements of interactions in the Web by allowing the dynamical evolution 
of weights during the system growth. The model displays power-law behavior for 
the different quantities, with non-trivial exponents whose values depend on the 
model's parameters and which are close to the measured ones. Strikingly, the 
model recovers a heavy-tailed out-traffic distribution whatever the out-degree 
distribution. Finally we find non-trivial clustering properties signaling the pres- 
ence of hierarchy and correlations in the graph architecture, in agreement with 
what is observed in real data of the WWW. 

1.1 Related works: Existing models for the web 

It has been realized early that the traditional random graph model, i.e. the 
Erdos-Renyi paradigm, fails to reproduce the topological features found in the 
WebGraph such as the broad degree probability distribution, and to provide a 
model for a dynamical growing network. An important step in the modeling of 
evolving networks was taken by Barabasi et al. |1I2(J) who proposed the ingre- 
dient of preferential attachment: at each time-step, a new vertex is introduced 
and connects randomly to already present vertices with a probability propor- 
tional to their degree. The combined ingredients of growth and preferential at- 
tachment naturally lead to power-law distributed degree. Numerous variations 
of this model have been formulated ^U] to include different features such as 
re- wiring |21I22| . additional edges, directionality |23I24| . fitness [21] or limited 
information |26| . 

A very interesting class of models that considers the main features of the 
WWW growth has been introduced by Kumar et al. [H] in order to produce a 
mechanism which does not assume the knowledge of the degree of the existing 
vertices. Each newly introduced vertex n selects at random an already existing 
vertex p; for each out-neighbour j of p, n connects to j with a certain probability 
a; with probability 1 — a it connects instead to another randomly chosen node. 



This model describes the growth process of the WWW as a copy mechanism in 
which newly arriving web-pages tends to reproduce the hyperlinks of similar web- 
pages; i.e. the first to which they connect. Interestingly, this model effectively 
recovers a preferential attachment mechanism without explicitely introducing it. 

Other proposals in the WWW modeling include the use of the rank values 
computed by the PageRank algorithm used in search engines, combined with the 
preferential attachment ingredient |27|. or multilayer models grouping web-pages 
in different regions J2E, in order to obtain bipartite cliques in the network. Finally, 
recent models include the textual content affinity j^H] as the main ingredient of 
the WWW evolution. 

2 Weighted model of the WWW graph 

2.1 The WWW graph 

The WWW network can be mathematically represented as a directed graph 
Q = (V, E) where V is the set of nodes which are the web-pages and where E 
is the set of ordered edges which are the directed hyperlinks — 1, N 
where N = \V\ is the size of the network). Each node i S V has thus an 
ensemble V» n (i) of pages pointing to i (in-neighbours) and another set V ou t(i) 
of pages directly accessible from i (out-neighbours). The degree k{i) of a node 
is divided into in-degree k m (i) = |V» n (i)| and out-degree k out {i) — \V out (i)\: 
k(i) = k m {i) + k out (i). The WWW has also dynamical features in that Q is 
growing in time, with a continuous creation of new nodes and links. Empirical 
evidence shows that the distribution of the in-degrees of vertices follows a power- 
law behavior. Namely, the probability distribution that a node i has in-degree 

k m behaves as P(k in ) ~ (fc™)" 7 "\ with 7^ = 2.1 ± 0.1 as indicated by the 
largest data sample |1I2I4I5 |. The out-degrees (k ouf ) distribution of web-pages is 
also broad but with an exponential cut-off, as recent data suggest |2I5) . While 
the in-degree represents the sum of all hyper-links coming from the whole WWW 
and can be in principle as large as the WWW itself, the out-degree is determined 
by the number of hyper- links present in a single web-page and is thus constrained 
by obvious physical elements. 

2.2 Weights and Strengths 

The number of users of any given web-site is also distributed according to a 
heavy-tail distribution 0]. This fact demonstrates the relevance of considering 
that every hyper-link has a specific weight that represents the number of users 
which are using it. The WebGraph Q(V, E) is thus a directed, weighted graph 
where the directed edges have assigned variables u>ij which specify the weight 
on the edge connecting vertex i to vertex j (wij — if there is no edge pointing 
from i to j). The standard topological characterization of directed networks is 
obtained by the analysis of the probability distribution P(k m ) [P(k out )] that a 
vertex has in-degree k m [out-degree k out }. Similarly, a first characterization of 



weights is obtained by the distribution P(w) that any given edge has weight w. 
Along with the degree of a node, a very significative measure of the network 
properties in terms of the actual weights is obtained by looking at the vertex 
incoming and outgoing strength defined as |3UI15| 

*° ut = E «•« • ^ = E f 1 ) 

jeVo«t(») jeVi„(i) 

and the corresponding distributions P(s m ) and P(s out ). The strengths sf and 
s° u * of a node integrate the information about its connectivity and the impor- 
tance of the weights of its links, and can be considered as the natural generaliza- 
tion of the degree. For the Web the incoming strength represents the actual total 
traffic arriving at web-page i and is an obvious measure of the popularity and 
importance of each web-page. The incoming strength obviously increases with 
the vertex in-degree kj n and usually displays the power-law behavior s ~ k@ , 
with the exponent (3 depending on the specific network |15| . 

2.3 The model 

Our goal is to define a model of a growing graph that explicitly takes into account 
the actual popularity of web-pages as measured by the number of users visiting 
them. Starting from an initial seed of Nq pages, a new node (web-page) n is 
introduced in the system at each time-step and generates m outgoing hyper- 
links. In this study, we take m fixed so that the out-degree distribution is a 
delta function. This choice is motivated by the empirical observation that the 
distribution of the number of outgoing links is bounded [Sj and we have checked 
that the results do not depend on the precise form of this distribution as long 
as P{k out (i) = k) decays faster than any power-law as k grows. 
The new node n is attached to a node i with probability 

Prob(n^i) = -fi^ (2) 

and the new link n — > i has a weight w n i = wq . This choice relaxes the usual de- 
gree preferential attachment and focuses on the popularity — or strength — driven 
attachment in which new web-pages will connect more likely to web-pages han- 
dling larger traffic. This appears to be a plausible mechanism in the WWW and 
in many other technological networks. For instance, in the Internet new routers 
connect to other routers with large bandwidth and traffic handling capabilities. 
In the airport network, new connections (airlines) are generally established with 
airports having a large passenger traffic [15131132*] . The new vertex is assumed 
to have its own initial incoming strength s™ = wq in order to give the vertex an 
initial non-vanishing probability to be chosen by vertices arriving at later time 
steps. 

The second and determining ingredient of the model consists in considering 
that a new connection (n — > i) will introduce variations of the traffic across the 




Fig. 1. Illustration of the construction rule. A new web-page n enters the Web 
and direct a hyper- link to a node i with probability proportional to s™ / Y^j s ] n - 
The weight of the new hyper-link is wq and the existing traffic on outgoing links 
of i are modified by a total amount equal to Sf. s° ut — » s° ut + Si. 

network. For the sake of simplicity we limit ourselves to the case where the intro- 
duction of a new incoming link on node i will trigger only local rearrangements 
of weights on the existing links (i — » j) where j € V ou t(i) as 

Wij — y w i3 + Awij, (3) 

where Awij is a function of Wij and of the connectivities and strengths of i. 
In the following we focus on the case where the addition of a new edge with 
weight wo induces a total increase Si of the total outgoing traffic and where this 
perturbation is proportionally distributed among the edges according to their 
weights [see Fig. 

^%- = <^- (4) 

This process reflects the fact that new visitors of a web-page will usually use its 
hyper-links and thus increase its outgoing traffic. This in turn will increase the 
popularity of the web-pages pointed by the hyperlinks. In this way the popular- 
ity of each page increases not only because of direct link pointing to it but also 
due to the increased popularity of its in-ncighbors. It is possible to consider het- 
erogeneous Si distributions depending on the local dynamics and rearrangements 
specific to each vertex, but for the sake of simplicity we consider the model with 
Si = S. We finally note that the quantity wq sets the scale of the weights. We can 
therefore use the rescaled quantities Wij/wo, Si/wo and S/wq, or equivalently set 
Wo = 1- The model then depends only on the dimensionless parameter S. The 
generalization to arbitrary wq is simply obtained by replacing S, Wij, s° ut and 
s\ n respectively by S/wq, Wij/wo, s° ut /wq and s\ n /wo in all results. 

2.4 Analytical solution 

Starting from an initial seed of Nq nodes, the network grows with the addition of 
one node per unit time, until it reaches its final size N. In the model, every node 
has exactly m outgoing links with the same weight wo = 1. During the growth 
process this symmetry is conserved and at all times we have s° ut = mwij. Indeed, 



each new incoming link generates a traffic reinforcement Awij — 5/m, so that 
Wij — wo + kl n S/m is independent from j and 

s? u * = m + Skf 1 . (5) 

The time evolution of the average of s|™ (i) and fc™ (i) of the i-th vertex at time 
t can be obtained by neglecting fluctuations and by relying on the continuous 
approximation that treats connectivities, strengths, and time t as continuous 
variables |7I9I10| . The dynamical evolution of the in-strength of a node i is given 
by the evolution equation 

Y rn^^S-, (6) 



with initial condition s\ n (t = i) = 1. This equation states that the incoming 
strength of a vertex i can only increase if a new hyper-link connects directly to i 
(first term) or to a neighbor vertex j 6 Vj n (i), thus inducing a reinforcement <5/m 
on the existing in-link (second term). Both terms are weighted by the probabil- 
ity that the new vertex establishes a hyperlink with the corresponding existing 
vertex. Analogously, we can write the evolution equation for the in-degree k] n 
that evolves only if the new link connects directly to i: 

Ah.in in 

= ■ (7) 

Finally, the out-degree is constant (k° ut = m) by construction. 

The above equations can be written more explicitly by noting that the ad- 
dition of each new vertex and its m out-links, increase the total in-strength 
of the graph by the constant quantities 1 + m + m8 yielding at large times 
Y^i—i s \ n — m (l + ~ + <5) t- By inserting this relation in the evolution equa- 
tions © and we obtain 



-h = r — H > s and — 

dt S + 1 + - \ t mt ^ 3 ] dt 



in 



and — *— = r - 5- . (8) 



These equations cannot be explicitly solved because of the term X^jev (i) S T 
which introduces a coupling of the in-strength of different vertices. The structure 
of the equations and previous studies of similar undirected models |31IH2| suggest 
to consider the Ansatz sf 1 = Ak™ in order to obtain an explicit solution. Using 
Eq.©, and Wji = s° ut /m, we can write 

4 n = E «* = *f + E ( 9 ) 

and the Ansatz s\ n = Ak™ yields 

E sf = j(A-l)sT. (10) 

iev in (i) 



This allows to have a closed equation for s\ n whose solution is 

sl n (t) = (T) , with 9 = — ^— (11) 
\z/ o + 1 + 1/m 

and fc-™(£) = s*™(i)/A, satisfying the proposed Ansatz. The fact that vertices are 
added at a constant rate implies that the probability distribution of s\ n is given 

by \mmi:V2\ 

p ( sin > *) = ^Ar f <K s ' m - s T{t))dh (12) 



* + N Jo 

where 5(x) is the Dirac delta function. By solving the above integral and con- 
sidering the infinite size limit t — > oo we obtain 

P(s ln ( t ) = s) ~ s-^" , with 7 ^ n =1 + 1 (13) 

The quantities s™, k™ 1 and s° M * are thus here proportional, so that their prob- 
ability distributions are given by power-laws with the same exponent j? n = 
lout — Jin' The explicit value of the exponents depends on 9 which itself is a 
function of the proportionality constant A. In order to find an explicit value 
of A we use the approximation that on average the total in-weight will be 
proportional to the number of in-links times the average weight in the graph 
< w >= ji- s° ut = (5+1). At this level of approximation, the exponent 9 
varies between m/(m+ 1) and 1 and the power-law exponent thus varies between 
2 (6 oo) and 2 + 1/m (5 = 0). This result points out that the model predicts 
an exponent ~ 2 for reasonable values of the out-degree, in agreement with 
the empirical findings. 



3 Numerical simulations 



Along with the previous analytical discussion we have performed numerical sim- 
ulations of the presented graph model in order to investigate its topological 
properties with a direct statistical analysis. 



3.1 Degree and strength distributions 

As a first test of the analytical framework we confirm numerically that s m , k m , 
g out are mc ieed proportional and grow as power-laws of time during the con- 
struction of the network [see Fig.©]. The measure of the proportionality factor 
A between s m and fc m allows to compute the exponents 9 and 7, which are satis- 
factorily close to the observed results and to the theoretical predictions obtained 
with the approximation A m< w >. Figure @ shows the probability distribu- 
tions of the relevant quantities (k m , w, s m , s out ) for 5 = 0.5. All these quantities 
are broadly distributed according to power-laws with the same exponent. It is 
also important to stress that the out-traffic is broadly distributed even if the 
out-degree is not. 




Fig. 2. Top left: illustration of the proportionality between s m and k m for 
various values of S. Bottom left: theoretical approximate estimate of the exponent 
Tin = "fout — Jin vs - ^ ^ or various values of m. Right: Probability distributions of 
k ln , w, s m , s out for 5 = 0.5, m = 2 and N = 10 5 . The dashed lines correspond 
to a power law with exponent 7 = 2.17 obtained by measuring first the slope A 
of s m vs. k m and then using equations (11) and (13) to compute 7. 



3.2 Clustering and hierarchies 

Along with the vertices hierarchy imposed by the strength distributions the 
WWW displays also a non-trivial architecture which reflects the existence of well 
defined groups or communities and of other administrative and social factors. 
In order to uncover these structures a first characterization can be done at the 
level of the undirected graph representation. In this graph, the degree of a node 
is the sum of its in- and out-degree (fcj = k\ n + k° ut ) and the total strength is 
the sum of its in- and out-strength (sj = sf 1 + s° ut ). A very useful quantity is 
then the clustering coefficient Cj that measures the local group cohesiveness and 
is defined for any vertex i as the fraction of connected neighbors couples of i [HJ- 
The average clustering coefficient C — N^ 1 J2i °i thus expresses the statistical 
level of cohesiveness by measuring the global density of interconnected vertex 
triplets in the network. Further information can be gathered by inspecting the 
average clustering coefficient C{k) restricted to classes of vertices with degree 

k 

i/ki—k 

where Nk is the number of vertices with degree k. In real WWW data, it has been 
observed that the k spectrum of the clustering coefficient has a highly non-trivial 
behavior with a power-law decay as a function of k, signaling a hierarchy in which 
low degree vertices belong generally to well interconnected communities (high 
clustering coefficient) while hubs connect many vertices that are not directly 
connected (small clustering coefficient) |34l35j . 




Fig. 3. Left: Clustering coefficient C(k), for various values of the parameter 8. 
Here m = 2 and N — 10 5 . The clustering increases with 8. Right: Correla- 
tions between degrees of neighbouring vertices as measured by k nn (k) (crosses), 
Kn, ln { km ) (circles) and k^ out (k m ) (squares); m = 2, 8 = 0.5 and N = 10 5 . 



We show in figure J2J the clustering coefficient C(k) for the model we propose, 
for various values of 8. We obtain a decreasing function of the degree k, in 
agreement with real data observation. In addition, the range of variations spans 
several orders of magnitude indicating a continuum hierarchy of cohesiveness 
levels as in the analysis of Ref . [35] . 

Another important source of information about the network structural orga- 
nization lies in the correlations of the connectivities of neighboring vertices [3fi| . 
Correlations can be probed by inspecting the average degree of nearest neighbor 
of a vertex i 

k n n,i = ^ ] kj , (15) 

where the sum runs on the nearest neighbors vertices of each vertex i. From 
this quantity a convenient measure to investigate the behavior of the degree 
correlation function is obtained by the average degree of the nearest neighbors, 
knn(k), for vertices of degree k 

knn(k^) — -j^— ^ ^ k nrLj i. (16) 
i/ki—k 

This last quantity is related to the correlations between the degree of connected 
vertices since on the average it can be expressed as 

k„ n (k) = Y J k'P{k'\k) . (17) 

k' 

If degrees of neighboring vertices are uncorrelated, P(k'\k) is only a function of 
k' and thus k nn (k) is a constant. When correlations are present, two main classes 



of possible correlations have been identified: Assortative behavior if k nn (k) in- 
creases with k, which indicates that large degree vertices are preferentially con- 
nected with other large degree vertices, and disassortative if k nn (k) decreases 
with k |37|. 

In the case of the WWW, however, the study of additional correlation func- 
tion is naturally introduced by the directed nature of the graph. We focus on 
the most significative, the in-degree of vertices that in our model is a measure 
of their popularity (s m ~ k m ). As for the undirected correlation, we can study 
the average in-degree of in-neighbours : 



This quantity measures the average in-degree of the in-neighbours of i, i.e. if the 
pages pointing to a given page i are popular on their turn. Moreover, relevant 
information comes also from 



which measures the average in-degree of the out-neighbours of i, i.e. the popu- 
larity of the pages to which page i is pointing. Finally, in both cases it is possible 
to look at the average of this quantity for group of vertices with in-degree 
in order to study the eventual assortative or disassortative behavior. 

In Figure|3]we report the spectrum of k nn (k), k™ n in (k m ) and fc^™ out (k m ) in 
graphs generated with the present weighted model. The undirected correlations 
display a strong disassortative behaviour with k nn decreasing as a power-law. 
This is a common feature of most technological networks which present a hierar- 
chical structure in which small vertices connect to hubs. The model defined here 
exhibits spontaneously the hierarchical construction that is observed in real tech- 
nological networks and the WWW. In contrast, both k™ nin (k m ) and k™ n out (k m ) 
show a rather flat behavior signaling an absence of strong correlations. This in- 
dicates a lack of correlations in the popularity, as measured by the in-degree. 
The absence of correlations in the behaviour of kij^ out {k m ) is a realistic feature 
since in the real WWW, vertices tend to point to popular vertices independently 
of their in-degree. We also note that k™ n out (k ln ) » k l ™ n in (k m ), a signature of 
the fact that the average in-degree of pointed vertices is much higher than the 
average in-degree of pointing vertices. This result also is a reasonable feature of 
the real WWW since the average popularity of webpages to which any vertex 
is pointing is on average larger than the popularity of pointing webpages that 
include also the non-popular ones. 

Finally, we would like to stress that in our model the degree correlations 
are to a certain extent a measure of popularity correlations and more refined 
measurements will be provided by the correlations among the actual popularity 
as measured by the in-strength of vertices. We defer the detailed analysis of these 
properties to a future publication, but at this stage, it is clear that an empirical 




(18) 




(19) 



analysis of the hyperlinks traffic is strongly needed in order to discuss in detail 
the WWW architecture. 

4 Conclusion 

We have presented a model for the WWW that considers the interplay between 
the topology and the traffic dynamical evolution when new web-pages and hyper- 
links are created. This simple mechanism produces a non trivial complex and 
scale-free behavior depending on the physical parameter S that controls the local 
microscopic dynamics. We believe that the present model might provide a general 
starting point for the realistic modeling of the Web by taking into account the 
coupling of its two main complex features, its topology and its traffic. 
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